Data integration method

ABSTRACT

A data integration method involves a unique method of collecting raw business data and processing it to produce highly useful and highly accurate information to enable business decisions. This process includes collecting global data, entity matching, applying an identification number, performing corporate linkage, and providing predictive indicators. These process steps work in series to filter and organize the raw business data and provide quality information to customers. In addition, the information is enhanced by quality assurance at each step in this process to ensure the high quality of the resulting data.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation-in-part of and claims thebenefit of application Ser. No. 10/368072, filed Feb. 18, 2003, entitled“Data Integration Method,” which is currently pending.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a process of collecting and enhancingcommercial data and, more particularly, to quality assurance and fivequality drivers.

2. Description of the Related Art

To be successful, businesses need to make informed decisions. In riskmanagement, businesses need to understand and manage total riskexposure. They need to identify and aggressively collect on high-riskaccounts. In addition, they need to approve or grant credit quickly andconsistently. They also need to verify prospect, customer and supplierdata to ensure compliance with government regulations. In sales andmarketing, businesses need to determine the most profitable customersand prospects to target, as well as incremental opportunity in anexisting customer base. They need to understand who and how big theirmost important customers are, acquire new high-growth customers thatlook like their best customers and reallocate their sales force based ongrowth and opportunity. In supply management, businesses need tounderstand the total amount being spent with suppliers to negotiatebetter. They also need to uncover risks and dependencies on suppliers toreduce exposure to supplier failure.

The success of these business decisions depends largely on the qualityof the information behind them. Quality is determined by whether theinformation is accurate, complete, timely, and Cross-Border Consistent.Accuracy is defined as having the right information on the rightbusiness. Completeness is defined as providing breadth and depth ofdata. Timeliness is making frequent updates to keep the informationfresh. Cross-Border Consistency is providing consistent data across theglobe. With thousands of sources of data available, it is a challenge todetermine which is the quality information a business should rely on tomake decisions. This is particularly true when businesses change sofrequently. In the next 60 minutes in the U.S., 251 businesses will havea suit, lien, or judgment filed against them, 58 business addresses willchange, 246 business telephone numbers will change or be disconnected,81 directorship (CEO, CFO, etc.) changes will occur, 41 new businesseswill open their doors, 7 corporations will file for bankruptcy, and 11companies will change their name.

Conventional methods of providing business data are incomplete. Someproviders collect incomplete data, fail to completely match entities,have incomplete numbering systems that recycle numbers, fail to providecorporate family information or provide incomplete corporate familyinformation, and merely provide incomplete value-added predictive data.It is an object of the present invention to provide more complete,timely, accurate, and consistent business data. This includes datacollection, entity matching, identification number assignment, corporatelinkage, and predictive indicators. This produces high quality businessinformation that provides insights so businesses can trust and decidewith confidence.

SUMMARY OF THE INVENTION

One aspect of the present invention is a method of data integrationcomprising collecting information comprising primary data. The primarydata is tested for accuracy and processed to produce secondary data andenhanced information comprising the primary data and the secondary datais provided. In some embodiments, primary and/or secondary data issampled periodically thereby generating sample data. The sample data isevaluated against at least one predetermined condition. Based upon thisevaluation, testing and/or processing steps are adjusted.

In some embodiments, testing comprises at least one of the followingsteps: (a) determining if the primary data matches stored data and (b)assigning an identification number to the primary data. It is determinedif the primary data meets a first threshold condition before assigningan identification number in step (b) if the primary data does not matchthe stored data in step (a). The first threshold condition is multiplesources confirm that a business associated with the primary data exists.The identification number is an entity identifier. The primary data isstored in a separate repository and assigned an identification number ifit does not meet the first threshold condition. Additional primary datais received and it is determined if the primary data and the additionalprimary data meet the first threshold condition, the entity is movedinto the multi-source repository.

Another aspect of the present invention is a system for dataintegration. The system includes a data generator, a testing unit, afirst processing unit, and a second processing unit. The data generatoris capable of gathering primary data from at least one data source. Thetesting unit is capable of testing the primary data for accuracy. Thefirst processing unit is capable of analyzing the primary data andgenerating secondary data from the result of the analysis. The secondprocessing unit is capable of merging the primary data and the secondarydata to form enhanced information. The testing unit, first processingunit, and the second processing unit may be the same or independent ofone another. In some embodiments, the testing unit comprises at leastone of a data matching unit and entity identifier unit. The firstprocessing unit comprises at least one of a corporate linkage unit and apredictive indicator unit.

Another aspect of the present invention is a machine-readable medium forstoring executable instructions for data integration. The instructionsinclude collecting information comprising primary data, testing theprimary data for accuracy, processing the primary data to producesecondary data, and providing enhanced information comprising theprimary data and the secondary data.

In some embodiments, the primary and/or secondary data is sampledperiodically, thereby generating sample data. The sample data isevaluated against at least one predetermined condition. The testingand/or processing is adjusted based upon the evaluation.

These and other features, aspects, and advantages of the presentinvention will become better understood with reference to the drawings,description, and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of the method of data integration according tothe present invention;

FIG. 2 is a block diagram of a system for data integration according tothe present invention;

FIG. 3 is a block diagram of a system for data integration according tothe present invention;

FIG. 4 is a logic diagram depicting the method of data integrationaccording to the present invention;

FIG. 5 is a block diagram of example sources of data collectionaccording to the present invention;

FIG. 6 is a block diagram of more example sources of data collectionaccording to the present invention;

FIGS. 7 and 8 are block diagrams of entity matching according to thepresent invention;

FIG. 9 is a block diagram of entity matching where no match is found inexisting databases of traditional businesses, but through sourcinginternal and external data stores an emerging business match is made

FIG. 10 is a block diagram of entity matching where matched data isdelivered to one database or matched through internal and external datasources or assigned an identification number and housed in a singlesource repository

FIGS. 11 and 12 are block diagrams of a method of entity matchingaccording to the present invention;

FIG. 13-15 are block diagrams of corporate linking according to thepresent invention;

FIG. 16 is a block diagram of corporate linkage following amerger/acquisition event;

FIG. 17 is a logic diagram of an example method of performing corporatelinkage according to the present invention;

FIG. 18 is a block diagram of corporate linkage where relationships areoutside of the definition of legal ownership to show other types oflinkages; and

FIGS. 19 and 20 are block diagrams of an example method of providing apredictive indicator according to the present invention.

DETAILED DESCRIPTION OF THE INVENTION

In the following detailed description, reference is made to theaccompanying drawings. These drawings form a part of this specificationand show, by way of example, specific preferred embodiments in which thepresent invention may be practiced. These embodiments are described insufficient detail to enable those skilled in the art to practice thepresent invention. Other embodiments may be used and structural,logical, and electrical changes may be made without departing from thespirit and scope of the present invention. Therefore, the followingdetailed description is not to be taken in a limiting sense and thescope of the present invention is defined only by the appended claims.

FIG. 1 shows an overview of a method of data processing according to thepresent invention. The foundation of the method is quality assurance102, which is the continuous data auditing, validating, normalizing,correcting, and updating done to ensure quality all along the process.There are five quality drivers that work sequentially to enhance theincoming data 104 to turn it into quality information 106. These fivedrivers are: a data collection driver 108, an entity matching driver110, an identification (ID) number driver 112, a corporate linkagedriver 114, and a predictive indicators driver 116. These five driversinterface with a database 118. Database 118 is an organized collectionof data and database management tools, such as a relational database, anobject-oriented database, or any other kind of database. Data indatabase 118 is continually refined and enhanced based on customerfeedback and quality assurance testing and procedures.

Data collection driver 108 brings together data from a variety ofsources worldwide. Then, the data is integrated into database 118through entity matching driver 110, resulting in a single, more accuratepicture of each business entity. Next, identification number driver 112applies an identification number as a unique means of identifying andtracking a business globally through any changes it goes through.Corporate linkage driver 114 then builds corporate families to enable aview of total corporate risk and opportunity. Finally, predictiveindicators driver 116 uses statistical analysis to rate a business' pastperformance and indicate the likelihood of a business to perform in aspecific way in the future.

FIGS. 2 and 3 show two example embodiments of systems for dataintegration according to the present invention, although other systemswould also be suitable for practicing the present invention. FIG. 2shows a network configuration while FIG. 3 shows a computer systemconfiguration. In FIG. 2, a network 200 facilitates communication amongthe other system components, including a computer system 202. The fivequality drivers, data collection driver 108, entity matching driver 110,identification number driver 112, corporate linkage driver 114, andpredictive indicators driver 116, and quality assurance 102 worksequentially to enhance the incoming data 104 to turn it into qualityinformation 106 stored in database 204. In FIG. 3, a computer system 300has a processor 302 with access to memory 304 via a bus 306. Memory 304stores an operating system program 308, a data integration program 310,and data 312.

FIG. 4 shows detail around Quality Assurance for each driver as anotherembodiment of a method of data integration according to the presentinvention. This method includes five main drivers of data integration:data collection 108, entity matching 110, identification number 112,corporate linkage 114, and predictive indicators 116 to produce qualityinformation 106. Quality information 410 is produced as a result ofquality assurance performed by each driver.

For data collection 108, a very large amount of global data is collectedfrom a variety of sources for increased accuracy. Quality assurance 400is performed for data collection 108 to verify legal name and ownershipto identify potential fraud, to update contact information, to updateand make changes based on events, to verify and enhance third partyinformation, and to ensure accuracy, completeness, timeliness andcross-border consistency. Quality assurance 400 continually refines andenhances data collection 108.

For entity matching 110, incoming data is matched to data in database118. Quality assurance 402 is performed for entity matching with manualand automated quality checks to ensure accurate matches and eliminateduplicates. Based on customer feedback and matching learnings, qualityassurance 402 for entity matching 110 is continually refined andenhanced.

For identification number 112, businesses are uniquely identified andtracked. Quality assurance 404 is performed for identification number112 by retaining an identification number for the life of a business andby being recognized as an industry standard. The identification numberallows verification of information in each of the five drivers. For datacollection 108, if data is not linked to an identification number, itindicates the possibility of a new business. For entity matching 110,the identification number allows new data to be accurately matched toexisting businesses. For corporate linkage 114, corporate families areassembled based on each business' identification number. For predictiveindicators 116, numbered data is used to build predictive tools. Averification process assigns an identification number when commercialactivity is confirmed. Quality assurance 404 for identification number112 includes validating and protecting against duplication. Theidentification number assignment process is continually refined andenhanced.

For corporate linkage 114, corporate families are built to provide aview of total risk and opportunity. Quality assurance 406 for corporatelinkage 114 includes building corporate families globally and updatingthem after mergers, acquisitions, and other events. Quality assurance406 for corporate linkage 114 includes increasing completeness andaccuracy of corporate families by having a dedicated team reviewcorporate families and by matching corporate families. Based on customerfeedback, the corporate linkage 114 is continually refined and enhanced.

For predictive indicators 116, statistical analysis is used to indicatethe likelihood of a business to perform in a specific way in the future.Quality assurance 408 for predictive indicators 116 includes continuallymonitoring and adjusting predictive indicators 116 to reflect newinformation. Based on customer feedback, the predictive indicators 116are continually refined and enhanced.

Thus, the five main components or drivers work together to integrate thedata collected into quality information 106 that is useful for makingbusiness decisions. The process is continually enhanced to continuallyimprove quality based on feedback, learnings and experience spanningover the past 160 years. Each of the five drivers is examined in moredetail below, starting with data collection driver 108.

Global Data Collection

FIG. 5 shows some sources of data used in data collection driver 108.Data is collected about customers, prospects, and suppliers with thegoal of collecting the most complete data possible. Preferably, database118 is a global database. For example, database 118 has data formillions of businesses worldwide and is updated daily. In this example,database 118 contains direct investigations, news, and media 502,payment and financial data (trade data) 504, public records andgovernment registries 506, and web sources and directories 508.. Paymentand financial data includes trade records updated frequently, completecoverage of public company financials, and coverage of financialstatements on privately held companies. Public records and governmentregistries include, suits liens, judgments, uniform commercial codefilings, bankruptcy filings, and business registrations. Web sources anddirectories include uniform resource locators (URLs), updates fromdomains, and customers providing online updates. Data can also becollected from other strategic data partners. These strategic partnersprovide data from international markets and conduct an agreed uponamount of due diligence on the data, prior to delivery into the globaldatabase. The inclusion of data from strategic partners enablescomprehensive global coverage.

In an example database 118, top news providers are monitored every dayto uncover changes and updates that affect the risk level and/ormarketing attributes and of the user's customers, prospects, andsuppliers. This data is focused on publicly traded companies withadditional coverage devoted to mergers and/or acquisitions and high riskor business deterioration. News is posted within 24 hours of release.The types of events include mergers and acquisitions, control changes,purchase or sale of assets, officer, name or location changes, earningsupdates, and business closings. The benefit is updated information thataffects the risk level of companies the user does business with andindications of key changes that can be used for marketing purposes.

In an example database 118, payment experiences from companies arecollected to help the user predict future payment habits of prospectsand customers. Accounts receivable data on U.S.-based companies providesan overall evaluation of how quickly and completely a company madepayment to each vendor. Many reports created using database 118 includepayment data. This payment experiences data has many benefits. Users geta picture of how a company is paying their vendors, bank loans, andother financial obligations. It enables showing payment trends overtime. It enables creation of predictive scores for use in applicationssuch as automated credit approvals. It helps pre-screen potentialcustomers based on their ability to pay on time. Payment experiences aresummarized to show the user how different industries are paid and creditlimits.

An example database 118 has public records from U.S. courts and legalfiling offices to provide critical insights into the risk of a company.This data includes U.S.-based company information on suits, liens,judgments, bankruptcies, and U.C.C. filings (collectively called publicrecords), information obtained from courts and recording offices,company filing for bankruptcy protection under Chapter 11(re-organization) or Chapter 7 (liquidation). This data captures amajority of the U.S. public filings and has many benefits. Over 10 yearsof historical coverage enable predictive credit ratings and scores.Users understand legal actions that could affect a company's ability tocontinue as an ongoing concern. A company's rating is negativelyimpacted when a bankruptcy takes place. Users are notified about allcompanies affected in a corporate family when a bankruptcy occurs withinthe corporate family.

In an example database 118, complete coverage of public companyfinancial statements and many privately held company financialstatements help the user to understand financial strength. This dataincludes balance sheet and income statements and private companyfinancial statements collected from certified public accountants (CPAs)or from corporate officers. In the US, for example, public companyfinancial information is obtained from the Securities and ExchangeCommission (SEC) or annual reports, 10K's and 10Q's. The database 118has complete coverage on public companies. Most financial statements areon privately-held companies. This data has many benefits. Usersunderstand financial strength, ability to pay on time and ability tocontinue as an ongoing concern. This data helps target prospects by sizeor financial strength.

An example database 118 has data from telephone calls that verify andenhance the third party information leading to over one and one-halfmillion updates to the database 118 everyday. This data includesinterviews with business principals to verify and enhance informationfrom other sources. Every public company is monitored daily.. There is afocus on collecting value-added data (e.g., business name, address,telephone number, SIC, employee number, sales, CEO/owner name). This hasmany benefits. It serves as an additional check on the accuracy of thedata, helps validate third party data, builds content on smallbusinesses, and makes the data consistent across the globe. Consistencyof data enables customers to rely on the same high quality ofinformation country to country, creating opportunity for growth,consistency in credit and marketing policies globally, understandingrisk exposure, marketing opportunity and reliance on suppliers globally.

The URL file is collected from external and internal sources. Each URLis mined several times a year to confirm its status (live, parked, underconstruction, redirect, inactive) and verify it belongs to the companyit has been assigned to using the name, address or telephone number fromthe existing database. Besides verification several times a yearadditional data elements such as security data, certificate data,strength of encryption and other data are collected from the URL. TheURL's verified are populated in the database using one-down linkage toexpand coverage across family tree members.

FIG. 6 shows some additional sources of data used by data collectiondriver 108 for increased accuracy, such as telephone company data 602,internet, news and media 604, direct investigations 606, companyfinancial information 608, payment data 610, courts and legal filingsoffices 612, government registries 614, and diversity data 616. Thiscompleteness of information aids profitable business decisions. In riskmanagement, a user assesses risk from non-United States (U.S.) companieswith the resulting information. Risk from small business customers canbe more completely identified. The user can make more informed riskdecisions when they are based on more complete information. In sales andmarketing, the user can identify new prospects from data drawn frommultiple sources. The user can gain access to international customersand prospects and cherry pick a prospect list with value-addedinformation such as standard industrial classification (SIC) and contactname. In supply management, the user may assess risk from foreignsuppliers with the resulting information and identify the risk fromsuppliers more completely. The user gains a fresher more completepicture of each customer, prospect, and supplier because of dailyupdates to database 118.

In an example, telephone company data is collected to identify newbusinesses, changes in existing records and to provide updated contactinformation. Businesses request new listings when initiating phoneservice. The benefits of this data include indication of a new businessor change in phone number and enabling creation of new records orenhancing existing ones, providing the most recent address, phonenumber, and line of business (SIC) information.

In an example, database 118 includes business registrations from stategovernment registries to verify legal name and ownership to identifypotential frauds. Database 1 18 has information on businessregistrations filed at the time a company is incorporated. This has manybenefits. It enables verification of the existence of registeredbusinesses, confirms information, such as a company's organizationalstructure, date, and state of incorporation (or organization), help aidin fraud investigation through review of names and principals andbusiness standing within a state, and identification of all changedrecords and new-to-file records.

Quality assurance 102 of database 118 ensures accuracy, completeness,timeliness, and cross-border consistency of global data. Qualityassurance includes standardizing data, correcting and updating data,ensuring phone numbers connect and mailing addresses deliver to theintended recipient, and conducting manual reviews.

Quality assurance 102 includes standardizing data. Numerous qualityedits and validations are made at the time of data entry. Data isvalidated to ensure consistency between branch and headquarter names,reasonability between number of employees, sales volume and line ofbusiness, prevent duplication of records, validate out-of-businessstatus changes and more. Global cleansing software, is used tostandardize marketable records and ensure consistency in presentation ofrecords Addresses are standardized before inclusion in the database.

Quality assurance 102 includes correcting and updating data. In anexample, the status of suits, liens, judgments and bankruptcy filingsare reviewed and updated. Data flows between internal teams to ensureinformation is consistently updated between areas of news, risk, ratingsand delivery. Constantly updating and refreshing the data, leads to highresponse rates on customer acquisition promotions, high match ratesbetween files and high quality data in the database 118.

Quality assurance 102 includes manual reviews. Third party data isvalidated with manual reasonability reviews. Payment re-checks aremanually performed on trade references appearing abnormal orexaggerated. Financial statements are reviewed to identify high riskbusinesses, ensure accuracy and apply capital strength ratingsconsistently across the universe of records. Comparisons ofmerger/acquisition update volumes are done with externally publishednumbers to ensure complete coverage.

Data is continually refined and enhanced through quality assurance 102and global data collection 108.

Entity Matching

FIG. 7 shows how multiple unmatched pieces of data 702 may be turnedinto a complete single business 704. Entity matching driver 110 checksthe incoming data 104 to see if it belongs to any existing business indatabase 118. In this example, ABC, Inc., Chuck's Mini-Mart, and CharlesSmith appear to be separate companies, but after entity matching, it isclear that they are all part of one enterprise, ABC Inc. and Chuck'sMini-Mart. The different addresses and other associated information isalso reconciled into a complete single business 704.

There are many benefits from entity matching driver 110. Entity matchingdriver 110 detects similarities in incoming data and combines it into asingle business. Queries are more likely to be accurate, customer,supplier, and prospect information is consolidated to provide morecomplete and accurate profiles, and there are less duplicate records. Inaddition, the customer can receive information about the quality oftheir matched records via D&B's matching feedback mechanisms, allowingthe customer to decide how to use the matched information in theirbusiness processes. Another benefit is that the customer receives aconsistent answer as the matching process is repeatable and defined.

FIG. 8 shows how incoming data 104 that matches a business in database118 is appended to that business through entity matching driver 110.Another case is shown in FIG. 9, where incoming data 104 that does notmatch any business in database 118 is sourced through internal andexternal data sources and matched to an emerging business or, as shownin FIG. 10, is assigned an identification number and held in an singlesource repository as learnings are gained on the entity. Entity matchingdriver 110 is designed to match data to the right business every time,thus, increasing efficiency. Entity matching driver 110 provides morecomplete and accurate profiles of customers, prospects, and suppliersand ensures far fewer duplicate businesses.

FIG. 11 shows an example method of matching via match driver 110. Thismethod includes cleaning and parsing and standardizing 1102, performingcandidate retrieval 1104, and evaluation and decision making 1106.Cleaning and parsing 1102 includes identifying key components of inquirydata 1108, normalizing and standardizing name, address, and city 1110,performing name consistency 1112, and performing address standardization1114. Candidate retrieval 1104 includes gathering possible matchcandidates from a reference database 1116, using optimized keys toimprove retrieval quality and throughput 1118, and retrieval isoptimized based on data provided in the inquiry data, observations ofexisting reference data and ongoing tuning 1120. Evaluation and decisionmaking 1106 includes evaluating matches according to a consistentstandard 1122, applying a match grade 1124, applying a confidence code1126, and applying a confidence percentile 1128.

To ensure quality assurance 102 of entity matching 110, manual andautomated checks are performed. Samples of matched records are manuallyreviewed. Based on experience, customer feedback and learnings, entitymatching 110 is recalibrated. Entity matching 110 allows and correctsfor variations in spelling, formats, trade names, addresses, and thelike. Entity matching 110 uses a match grade and confidence code todetermine if the match passes the quality threshold. Entity matching 110provides a consistent, repeatable process that is not based on humanjudgment. The benefits are more accurate matches and less duplicates.

Quality assurance 102 of entity matching 110 includes continuallyrefining and enhancing entity matching 110 based on customer feedback.Samples of matched records are manually reviewed, technology allows forcorrections in spelling, formats, trade names, addresses. Technologyalso interprets context of key parts of the inquiry to better finddifficult matches (i.e. interpret parts of the sound, geographicposition, implied line of business, acronyms). Quality assurance is alsoensured by using a customized retrieval approach for each inquiry thatlooks at the best way to find a match to optimize the result for eachunique inquiry (i.e. some matches are better made by using soundalgorithms, other matches are better made by using exact name matches).As enhancements are made, they become available both online and in batchsystems to ensure consistency. The benefits of these improvements areincreased search candidates, additional functionality and increasedthroughput. In other words, more hits, more better hits, and more betterhits faster. Matching capabilities include matches to a proprietarydatabase containing multiple names and addresses per record, the abilityto identify matches that don't look exactly like each other, and theability to select by the quality of the match.

DUNS Number

Identification (ID) number driver 112 appends a unique identificationnumber to every business location so it can be easily and accuratelyidentified. This identification number is non-indicative. One example ofthe unique identification number is such as the D-U-N-S® Numberavailable from Dun & Bradstreet headquartered in Short Hills, N.J.,which is a nine-digit number that allows business locations to be easilytracked through changes and updates. The identification number isretained for the life of a business. No two business locations everreceive the same identification number and the identification numbersare never recycled. The identification number acts as an industrystandard for business identification. It is endorsed by the UnitedNations, the European Commission, and over fifty industry groups.

The identification number is a central concept in the data processingmethod according to the present invention. For quality assurance, theidentification number allows verification of information at every stageof the process. For data collection driver 108, if data is not linked toan existing identification number, it indicates the possibility of a newbusiness. For entity matching driver 110, the identification numberallows new data to be accurately matched to existing businesses. Forcorporate linkage driver 114, corporate families are assembled based oneach business' identification number. For predictive indicators driver116, the identification number is used to build predictive tools.

Additionally, the identification number opens new areas of opportunityto a user's business by helping to verify that a business exists andvalidating the business location. Users are provided a complete view ofprospects, customers, and suppliers. Existing data is clarified,duplication is identified, and related businesses are shown to berelated. Users can more easily manage large groups of customers orsuppliers when the identification number is appended to the user'sinformation. The identification number enables fast and easy dataupdates when appended to the user's information. The identificationnumber provides a complete view of prospects and customers by placingbusinesses, where applicable, within their domestic and global corporate‘families’, identifying penetration and opportunities for up-sell andcross-sell. The identification number also helps aggregate data frommultiple and disparate systems to gain better insight with one completeview of prospects, customers and suppliers.

The identification number not only helps identify duplication in fileswithin the database, but also enables customers with a unique key thatcan be used to identify duplication in the customer's existing portfolioof accounts.

FIG. 13 shows an example method of identification number driver 112.Data collection 108 provides input data that is pre-processed 1300 byde-duping, appending phone and SIC, validating address and town, andchecking for branches and franchises. This processed data is matched toa unique identifier file 1302. If a match is found, data is appended toan existing record in the multi-sourced file 1304. If a match is notfound, the data is included in a single source repository 1306 and,then, unique identifier assignment rules are applied 1308. As new filesare received and additional sources validate a record in the singlesource repository, that record then becomes included in the multi sourcefile. If new sources do not validate the record, the record stays in thesingle source repository.

Quality assurance 102 includes how identification numbers are managed.In an example, an identification number is retained for the life of abusiness. No two businesses ever receive the same identification number.Identification numbers are never recycled.. The identification number isretained when a company moves anywhere within the same country. Theidentification number is preferably an industry standard for businessidentification.

Quality assurance 102 of identification number driver 112 includesvalidation and protection against duplication. Rigorous processing isdone to identify duplicate identification numbers including usingduplicate scoring systems, implementing controls around bulk filebuilding and undergoing validations prior to entering the database. Inan example, every business is validated before it is included indatabase 118 so that the address is based on postal standards, incomingrecords are validated in relation to a town file (e.g., address, city,ZIP, state, and telephone number), and phone number and line of businessare verified. There is multiple source validation, i.e., businessregistrations sometimes do not indicate a business has begun operations.

Quality assurance 102 of identification number driver 112 includesrefining and enhancing the identification number assignment process.

Corporate Linkage

FIGS. 14-16 show how corporate linkage driver 114 builds corporatelinkage to reveal how companies are related. Without corporate linkage,the companies, L Refinery Div. 1402, C Stores Inc. 1404, and G StorageDiv. 1406 in FIG. 14 appear to be unrelated.

As shown in FIG. 15, however, applying corporate linkage allows theentire corporate family to be viewable without limit in depth orbreadth. Parent company U Products Group Corp. 1502 and has threesubsidiaries under it, L Inc. 1504, C Inc 1506, and G Inc. 1508. L Inc.1504 has two branches, L Storage Div. 1510 and L Refinery Div. 1402(shown in FIG. 14). C Inc. 1506 has two branches, Industrial Co. 1512and Building Co. 1514 and a subsidiary, C Stores Inc. 1404 (shown inFIG. 4). G Inc. 1508 has two branches, G Storage Div. 1406 (shown inFIG. 14) and G Refinery Div. 1516. C Stores Inc. has four branches,North Store Inc. 1518, South Store Inc. 1520, West Store Inc. 1522, andEast Store Inc. 1524. Building extensive corporate linkage allows abusiness information provider to be an industry leader by providing thiscomplete detail.

FIG. 16 shows how corporate linkage driver 114 updates family treesafter mergers and acquisitions. In this example, two separatebusinesses, ABC 1602 and XYZ 1604 exist before a merger and each havetheir own subsidiaries and branches. After the merger, ABC XYZ 1606 hastwo subsidiaries, ABC subsidiary 1608 and XYZ subsidiary 1610, each withtheir own branches and/or subsidiaries.

Corporate linkage driver 114 opens up profitable opportunities in riskmanagement, sales and marketing, and supply management for a user. Itallows the user to understand the total risk exposure and regulatory andstatutory compliance implications across a corporate family. The userrecognizes the relationship between bankruptcy or financial stress inone company and the rest of its corporate family. The user increasessales by up-selling and cross-selling with a corporate family. The userreduces expenses by reducing research time. The user can maximize theopportunity based on revenues from an entire corporate family. The usercan understand where purchase decisions are made. The user can identifypossible conflicts of interest. The user can determine its total spendwith a corporate family to better negotiate.

FIG. 17 shows an example method of performing corporate linkage driver114. Generally, it shows a method of updating family tree linkage 1700where the goal is to correctly link all subsidiaries and branches ofeach entity having an identification number with consistent names,tradestyles, and correct employee numbers, while resolving alllook-a-likes (LALs).

Members of a corporate family are identified by their relationship toother members. In an example, members include a global ultimate, adomestic ultimate, parents corporations, subsidiaries, headquarters, andbranches. A global ultimate is a highest ranking member of a corporatefamily globally. A domestic ultimate is the highest ranking member of acorporate family within a specific country. A parent corporation is acompany that owns more than half of another company. A subsidiary is acompany that is more than half owned by a parent company. Headquartersis a company with reporting branches or divisions. A branch is asecondary location or operation, not a separate entity.

FIG. 18 shows other relationships among DUNS numbered entities can beoverlaid on to the Corporate Linkage view to enrich the overallunderstanding of a group of otherwise potentially independent entities.Examples of this include franchise relationships, associations, co-ops,agents, dealers, chapters and affiliated concerns.

Quality assurance 102 during corporate linkage 114 increases thecompleteness and accuracy of corporate families. In an example, adedicated team reviews corporate families. This ensures business names,tradestyles, and SICs are consistent within a corporate family. Qualityassurance 102 includes checking for duplicates. There are central reviewand updates for the largest global family trees. Changes are monitoredto identify and track mergers and acquisitions and other major events.Quality assurance 102 includes matching of corporate families. There arequality programs to ensure business entities are linked properly and tohandle linkage breaks within a corporate tree. Corporate linkage is donethrough legal ownership. Quality assurance 102 of corporate linkage 114includes continually refining and enhancing corporate linkage based oncustomer feedback. Corporate linkage 114 capabilities include globalcross-border linkage, U.S. linkage, public company linkage, privatecompany linkage, and linkage defined by legal ownership versus businessname. Quality assurance processes include using a validation tool toidentify erroneously unlinked records or ‘look-a-likes’. The qualityassurance processes are continually refined and enhanced based onlearnings, feedback and reviews.

Predictive Indicators

Predictive indicator driver 116 summarizes the information collected ona business and uses it to predict future performance. Predictiveindicators use statistical analysis to indicate the likelihood of abusiness to perform in a specific way in the future. There are manybenefits to predictive indicators. Users can make faster, moreconsistent decisions by allowing automated decisions for increasedefficiency. Users can free up resources to look at time-intensiveborderline decisions. Users can make more consistent decisions acrossthe entire organization. Users can allow faster processing of largevolumes of transactions. Users can apply scores across an entireportfolio to quickly identify risk and opportunity. Users can helpestimate demand to target the right prospects and reduce acquisitioncosts.

There are three types of predictive indicators: descriptive ratings,predictive scores, and demand estimators. Descriptive ratings summarizehow a customer has historically been paying bills. Predictive scores area prediction of how likely it is for a business to pay promptly orcontinue as an ongoing concern. Demand estimators estimate how much of aproduct a business is likely to buy in total (response, approval,look-a-like models).

Predictive indicators help a user to accelerate and impact profitabilityin all areas of its business. In risk management, descriptive ratingsand predictive scores help the user grant or approve credit. A ratingindicates creditworthiness of a company based on past financialperformance. A score indicates likelihood of a business to continue asan ongoing concern or pay on time. Predictive scores can be appliedacross the user's whole portfolio to quickly identify high-risk accountsand begin aggressive collection immediately or to evaluate the creditworthiness of each applicant. A commercial credit score predicts thelikelihood of a business paying slow over the next twelve months. Afinancial stress score predicts the likelihood of a business failingover the next twelve months. In sales and marketing, look-a-like models,response models and demand estimators let a user: identify prospectsthat look like their best customers, identify who is likely to respondto an offer, and/or how much product they will buy so that it canprioritize opportunities among customers or prospects. Examples ofdemand estimators include number of personal computers and local or longdistance spending. In supply management, predictive scores can beapplied to all of a user's suppliers to quickly understand their risk offailing in the future.

In addition, predictive scores may be customized according to a user'sspecific need and criteria. For example, criteria may be used, such as(1) what behavior does the user want to predict; (2) what is the size ofthe business the user wants to assess; and (3) what are the decisionrules based on the user's risk tolerance to translate risk assessment into a credit decision or risk management or marketing action.

Predictive indicators are enabled by analytic capability and datacapability. For example, a dedicated team of experiencedbusiness-to-business (B2B) expert PhDs may build the underlyingpredictive models and have access to industry-specific knowledge,financial and payment information, and extensive historical informationfor analysis.

FIGS. 19 and 20 show an example method of creating a predictiveindicator. It starts with market analysis 1802 and then there is abusiness decision on model development 1804. This decision involves thetype of score to be developed and output at the end, such as a failurerisk score, a delinquency risk score, or an industry specific score. Thefailure risk score is the likelihood that a company will ceaseoperations. The delinquency risk score is the likelihood that a companywill pay late. The industry specific score predicts somethingparticular, such as the likelihood of using copiers or truckers orwhether a company is a good credit risk. Input data 1806 is gatheredfrom an archive of credit database 1808 and a trade tape database 1810which provide historical data related to credit. There are two timeperiods of concern, an observation period which is a look historicallyat all the facts and a performance period which is a time period justafter that to see what happened. For example, given data in the previousyear, how did a company perform with respect to a certain time period inthe current year. The next step, refers to a risk to be evaluated, suchas a financial stress score that predicts the likelihood of a negativefailure in the next twelve months.

A development sample is selected from a business universe 1814, ademographic profile is created of the business universe 1816, andexploratory data analysis is performed 1818 (univariate analysis of allvariables. Tasks are performed such as determining the relationshipsbetween the variable and what is being predicted, the range of avariable, the type of variable, including or not including variables,and other functions related to understanding what to put in the model.Variables may be selected in accordance with the observation period andthe performance period and weights may be assigned to indicate accuracyor representativeness. Trends are factored in. Quality assuranceincludes periodically checking to see if anything in the businessuniverse effects the initial model and to take a score and run itagainst a prior period to check that it is still indicative orpredictive.

Continuing on FIG. 19, statistical analysis and model developmentprocesses including logistic regression and other estimation techniques1820 are performed. This step includes applying the appropriate models,formulas, and statistics. Next, statistical coefficients are convertedinto a scorecard 1822. Models are tested and validated 1824, andtechnical specifications are developed 1826. Finally, the model isimplemented 1828 and tested 1830. Data is run through the model togenerate a score. Periodically, checks are performed to verify that thescore is still valid and to determine if the scorecard needs to beupdated.

Quality assurance 102 of predictive indicators 116 includes continuallymonitoring and adjusting predictive indicators to reflect newinformation. In an example, this includes periodic testing ofpredictiveness, continuous manual refinement and recalibration,automated changes, monthly audits and annual validation, and analyzingdata for each model with respect to its predictive qualities andimportance whenever models are created or updated. Also, predictiveindicators are continually refined and enhanced based on customerfeedback. Predictive indicators 116 has data depth, includingdemographic data, payment information, detailed public recordinformation, such as suits, liens, judgments, bankruptcies, and UCCfilings, public and private company financial information, and linkagedata used to assign risk to the responsible entity (i.e., score brancheswith HQ data). An independent group of reviewers check and validate theresults of the scores, from which continual refinement and enhancementis realized. Customer needs and industry trends are also considered whenquality assurance processes are done to continually improve the modelsand scores.

The present invention has many advantages. Preferably, a global databaseused to perform a method of data integration encompasses millions ofrecords and is updated daily. Users gain a fresher, more completepicture of each of their customers, prospects, and suppliers, because ofthe large number of daily updates to the database. Users are able toassess the risk of non-U.S. companies, because the database has globaldata. Users can more completely identify the risk from small businesscustomers. Users make more informed risk decisions. Users identify newprospects from data drawn from multiple sources. Users gain access tointernational customers, suppliers and prospects. Users receive enhancedprospect lists with value-added information, such as line of businessand contact name. Users can assess risk from foreign suppliers. Userscan identify more complete the risk from suppliers.

It is to be understood that the above description is intended to beillustrative and not restrictive. Many other embodiments will beapparent to those of skill in the art upon reviewing the abovedescription. Various embodiments for performing data collection,performing entity matching, applying an identification number,performing corporate linkage and providing predictive indicators aredescribed. The present invention has applicability to applicationsoutside the business information industry. Therefore, the scope of thepresent invention should be determined with reference to the appendedclaims, along with the full scope of equivalents to which such claimsare entitled.

1. A system of data integration, comprising: a data collection driverfor collecting business information; an entity matching driver formatching said business information to data in a database that isassociated with a business entity; an identification number driver forassigning a unique identifier to said business entity; a corporatelinkage driver for providing linkage of said business entity to acorporate family; and a predictive indicator driver for providingstatistical and analytical information about the likelihood that saidbusiness will perform in a specific way in the future; wherein said datacollection driver, said entity matching driver, said identificationnumber driver, said corporate linkage driver, and said predictiveindicator driver produce quality information for business decisions. 2.The system according to claim 1, wherein said data collection drivercollects data from a plurality of sources, said sources includingprivate company financial data, public company financial data, courtsand legal filing office data, telephone company data, directinvestigation data, government registries, diversity data, payment data,news, media and the Internet.
 3. The system according to claim 1,wherein said data collection driver involves updating the data daily toensure fresh, accurate data.
 4. The system according to claim 1, whereinsaid data collection driver includes collecting the data from strategicpartners, in addition to collecting raw data.
 5. The system according toclaim 1, wherein said data collection driver encompasses globalinformation to ensure data is consistently available across the globe.6. The system according to claim 1, wherein said data collection driverinvolves rigorous quality assurance practices to ensure the data comingin from various sources is accurate, timely, complete and consistent. 7.The system according to claim 1, wherein said entity matching drivermatches to a proprietary database containing multiple names andaddresses per record.
 8. The system according to claim 1, wherein saidentity matching driver identifies and removes duplicates.
 9. The systemaccording to claim 1, wherein said entity matching driver is able toidentify matches that are not identical.
 10. The system according toclaim 1, wherein said entity matching driver is able to select a matchby a confidence measure.
 11. The system according to claim 1, whereinsaid entity matching driver uses a process of cleaning, parsing andstandardizing, performing candidate retrieval, and evaluation anddecision making as each match is made.
 12. The system according to claim1, wherein said entity matching driver is quality assured by constantlyrefining and returning match parameters based on customer feedback,manual reviews, match learnings and advances in match technologies. 13.The system according to claim 1, wherein said identification numberdriver assigns said unique identifier to a single, unique businesslocation and never recycles said unique identifier.
 14. The systemaccording to claim 1, wherein said identification number driver usessaid unique identifier to track said business through mergers,acquisitions, control changes and the entire business life cycle,including business discontinuance.
 15. The system according to claim 1,wherein said identification number driver is used to aggregate data andidentify duplicates in said database.
 16. The system according to claim1, wherein said identification number driver is a nonindicative number.17. The system according to claim 1, wherein said identification numberdriver can be used to identify duplication in customer files as well asin said database.
 18. The system according to claim 1, wherein saididentification number driver plays a key role in each of the fivedrivers involved in this data integration process as it is anidentification key for a business record as it travels through theentire process.
 19. The system according to claim 1, wherein saididentification number driver undergoes rigorous quality assuranceprocedures to protect against duplication and maintain the integrity ofthe numbering system.
 20. The system according to claim 1, wherein saidcorporate linkage driver provides cross-border linkage.
 21. The systemaccording to claim 1, wherein said corporate linkage driver providespublic company linkage.
 22. The system according to claim 1, whereinsaid corporate linkage driver provides private company linkage.
 23. Thesystem according to claim 1, wherein said corporate linkage driverdefines linkage by ownership.
 24. The system according to claim 1,wherein said corporate linkage driver can also be referenced to providedetails of relationships outside of corporate ownership, includingaffiliations, associations, co-ops, franchises, agents, dealers andchapters.
 25. The system according to claim 1, wherein said corporatelinkage driver uses quality assurance procedure to improve the accuracy,completeness, timeliness and consistency of corporate linkage data. 26.The system according to claim 1, wherein said predictive indicatordriver uses payment information.
 27. The system according to claim 1,wherein said predictive indicator driver uses public records.
 28. Thesystem according to claim 1, wherein said predictive indicator driveruses public and private company financial information.
 29. The systemaccording to claim 1, wherein said predictive indicator driver useslinkage data to assign risk to a responsible entity.
 30. The systemaccording to claim 1, wherein said predictive indicator driver can beselected from the group consisting of: descriptive ratings, predictivescores, and demand estimators.
 31. The system according to claim 1,wherein said predictive indicator driver are enabled by analyticcapability and data capability.
 32. The system according to claim 1,wherein said predictive indicator driver are continually monitored andadjusted to improve the overall quality and accuracy.
 33. A method ofdata integration, comprising: collecting business information; matchingsaid business information to data in a database that is associated witha business entity; assigning a unique identifier to said businessentity; providing linkage of said business entity to a corporate family;providing statistical and analytical information as predictiveindicators about the likelihood that said business will perform in aspecific way in the future; and producing quality business and financialinformation for business decisions.